Pesquisa | Portal Regional da BVS

1.

The influence of listener experience, measurement scale and speech task on the reliability of auditory-perceptual evaluation of vocal quality.

Alves, Jônatas do Nascimento; Almeida, Anna Alice Figueiredo de; Yamasaki, Rosiane; Lopes, Leonardo Wanderley.

Codas ; 36(3): e20230175, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38629682

RESUMO

PURPOSE: To assess the influence of the listener experience, measurement scales and the type of speech task on the auditory-perceptual evaluation of the overall severity (OS) of voice deviation and the predominant type of voice (rough, breathy or strain). METHODS: 22 listeners, divided into four groups participated in the study: speech-language pathologist specialized in voice (SLP-V), SLP non specialized in voice (SLP-NV), graduate students with auditory-perceptual analysis training (GS-T), and graduate students without auditory-perceptual analysis training (GS-U). The subjects rated the OS of voice deviation and the predominant type of voice of 44 voices by visual analog scale (VAS) and the numerical scale (score "G" from GRBAS), corresponding to six speech tasks such as sustained vowel /a/ and /É/, sentences, number counting, running speech, and all five previous tasks together. RESULTS: Sentences obtained the best interrater reliability in each group, using both VAS and GRBAS. SLP-NV group demonstrated the best interrater reliability in OS judgment in different speech tasks using VAS or GRBAS. Sustained vowel (/a/ and /É/) and running speech obtained the best interrater reliability among the groups of listeners in judging the predominant vocal quality. GS-T group got the best result of interrater reliability in judging the predominant vocal quality. CONCLUSION: The time of experience in the auditory-perceptual judgment of the voice, the type of training to which they were submitted, and the type of speech task influence the reliability of the auditory-perceptual evaluation of vocal quality.

Assuntos

Disfonia , Percepção da Fala , Humanos , Fala , Reprodutibilidade dos Testes , Medida da Produção da Fala , Variações Dependentes do Observador , Qualidade da Voz , Acústica da Fala

2.

Interrater Variability among Anaesthesiologists Using American Society of Anesthesiologists Physical Status Classification System.

Sharma Bhattarai, Amit; Bista, Navindra Raj; Basnet, Madindra Bahadur; Joshi, Deepak Raj; Shrestha, Anil.

J Nepal Health Res Counc ; 21(4): 543-549, 2024 Mar 31.

Artigo em Inglês | MEDLINE | ID: mdl-38616581

RESUMO

BACKGROUND: The American Society of Anaesthesiologists Physical Status classification is deployed by the anaesthesiologists worldwide to classify operative surgical patients. Many studies have found moderate degree of interrater variability among anaesthesiologists. The general objective of the study was to find out interrater variability among Nepalese anesthesiologists using this classification system in Nepal. The specific objectives of the study were to find out the correctness of assignment and inter-rater variability among anaesthesiologists based on their experience. METHODS: Ten clinical cases were distributed among 130 registered anaesthesiologist practitioners of Nepal after validation with the experts. Respondents were asked to assign each of ten cases to a specific physical status class. Anaesthesiologists were classified to two classes based on clinical experience as having more or less than five years of experience. RESULTS: We found substantial agreement among < 5 year's (0.66) and > 5 year's experience group (0.753) and among all raters (0.736). The mean score of the group with less than 5 years of experience was more. There was no significant difference between the mean score (p = 0.595). Overall mean score for the both groups was 5.66 with SD 1.66. There was no significant difference between the groups. CONCLUSIONS: The study shows that there is very less variation among registered practising anaesthesiologists of Nepal using American Society of Anesthesiologists Physical Status classification system.

Assuntos

Anestesiologistas , Variações Dependentes do Observador , Exame Físico , Humanos , Nepal , População do Sul da Ásia , Exame Físico/classificação

3.

Novel Scoring Scale for Quality Assessment of Lung Ultrasound in the Emergency Department.

Balderston, Jessica R; Brittan, Taylor; Kimura, Bruce J; Wang, Chen; Tozer, Jordan.

West J Emerg Med ; 25(2): 264-267, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38596928

RESUMO

Introduction: The use of a reliable scoring system for quality assessment (QA) is imperative to limit inconsistencies in measuring ultrasound acquisition skills. The current grading scale used for QA endorsed by the American College of Emergency Physicians (ACEP) is non-specific, applies irrespective of the type of study performed, and has not been rigorously validated. Our goal in this study was to determine whether a succinct, organ-specific grading scale designed for lung-specific QA would be more precise with better interobserver agreement. Methods: This was a prospective validation study of an objective QA scale for lung ultrasound (LUS) in the emergency department. We identified the first 100 LUS performed in normal clinical practice in the year 2020. Four reviewers at an urban academic center who were either emergency ultrasound fellowship-trained or current fellows with at least six months of QA experience scored each study, resulting in a total of 400. The primary outcome was the level of agreement between the reviewers. Our secondary outcome was the variability of the scores given to the studies. For the agreement between reviewers, we computed the intraclass correlation coefficient (ICC) based on a two-way random-effect model with a single rater for each grading scale. We generated 10,000 bootstrapped ICCs to construct 95% confidence intervals (CI) for both grading systems. A two-sided one-sample t-test was used to determine whether there were differences in the bootstrapped ICCs between the two grading systems. Results: The ICC between reviewers was 0.552 (95% CI 0.40-0.68) for the ACEP grading scale and 0.703 (95% CI 0.59-0.79) for the novel grading scale (P < 0.001), indicating significantly more interobserver agreement using the novel scale compared to the ACEP scale. The variance of scores was similar (0.93 and 0.92 for the novel and ACEP scales, respectively). Conclusion: We found an increased interobserver agreement between reviewers when using the novel, organ-specific scale when compared with the ACEP grading scale. Increased consistency in feedback based on objective criteria directed to the specific, targeted organ provides an opportunity to enhance learner education and satisfaction with their ultrasound education.

Assuntos

Serviço Hospitalar de Emergência , Pulmão , Humanos , Pulmão/diagnóstico por imagem , Estudos Prospectivos , Ultrassonografia , Escolaridade , Variações Dependentes do Observador , Reprodutibilidade dos Testes

4.

Reproducibility assessment of rapid strains in cardiac MRI: Insights and recommendations for clinical application.

Halfmann, Moritz C; Hopman, Luuk H G A; Körperich, Hermann; Blaszczyk, Edyta; Gröschel, Jan; Schulz-Menger, Jeanette; Salatzki, Janek; André, Florian; Friedrich, Silke; Emrich, Tilman.

Eur J Radiol ; 174: 111386, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38447431

RESUMO

PURPOSE: Studies have shown the incremental value of strain imaging in various cardiac diseases. However, reproducibility and generalizability has remained an issue of concern. To overcome this, simplified algorithms such as rapid atrioventricular strains have been proposed. This multicenter study aimed to assess the reproducibility of rapid strains in a real-world setting and identify potential predictors for higher interobserver variation. METHODS: A total of 4 sites retrospectively identified 80 patients and 80 healthy controls who had undergone cardiac magnetic resonance imaging (CMR) at their respective centers using locally available scanners with respective field strengths and imaging protocols. Strain and volumetric parameters were measured at each site and then independently re-evaluated by a blinded core lab. Intraclass correlation coefficients (ICC) and Bland-Altman plots were used to assess inter-observer agreement. In addition, backward multiple linear regression analysis was performed to identify predictors for higher inter-observer variation. RESULTS: There was excellent agreement between sites in feature-tracking and rapid strain values (ICC ≥ 0.96). Bland-Altman plots showed no significant bias. Bi-atrial feature-tracking and rapid strains showed equally excellent agreement (ICC ≥ 0.96) but broader limits of agreement (≤18.0 % vs. ≤3.5 %). Regression analysis showed that higher field strength and lower temporal resolution (>30 ms) independently predicted reduced interobserver agreement for bi-atrial strain parameters (ß = 0.38, p = 0.02 for field strength and ß = 0.34, p = 0.02 for temporal resolution). CONCLUSION: Simplified rapid left ventricular and bi-atrial strain parameters can be reliably applied in a real-world multicenter setting. Due to the results of the regression analysis, a minimum temporal resolution of 30 ms is recommended when assessing atrial deformation.

Assuntos

Imagem Cinética por Ressonância Magnética , Imageamento por Ressonância Magnética , Humanos , Estudos Retrospectivos , Reprodutibilidade dos Testes , Imagem Cinética por Ressonância Magnética/métodos , Átrios do Coração , Variações Dependentes do Observador , Função Ventricular Esquerda

5.

A Novel Method for the Measurement of the Vaginal Wall Thickness by Transvaginal Ultrasound: A Study of Inter- and Intra-Observer Reliability.

Bosio, Sara; Barba, Marta; Vigna, Annalisa; Cola, Alice; De Vicari, Desirèe; Costa, Clarissa; Volontè, Silvia; Frigerio, Matteo.

Medicina (Kaunas) ; 60(3)2024 Feb 22.

Artigo em Inglês | MEDLINE | ID: mdl-38541095

RESUMO

Background and Objectives: A consensus regarding the optimal sonographic technique for measuring vaginal wall thickness (VWT) is still absent in the literature. This study aims to validate a new method for measuring VWT using a biplanar transvaginal ultrasound probe and assess both its intra-operator and inter-operator reproducibility. Material and Methods: This prospective study included patients with genitourinary syndrome of menopause-related symptoms. Women were scanned using a BK Medical Flex Focus 400 with the 65 × 5.5 mm linear longitudinal transducer of an endovaginal biplanar probe (BK Medical probe 8848, BK Ultrasound, Peabody, MA, USA). Vaginal wall thickness (VWT) measurements were acquired from the anterior and posterior vaginal wall at three levels. Results: An inter-observer analysis revealed good consistency between operators at every anatomical site, and the intra-class coefficient ranged from 0.931 to 0.987, indicating high reliability. An intra-observer analysis demonstrated robust consistency in vaginal wall thickness measurements, with an intra-class coefficient exceeding 0.9 for all anatomical sites. Conclusions: The measurement of vaginal wall thickness performed by transvaginal biplanar ultrasound was easy and demonstrated good intra- and inter-operator reliability.

Assuntos

Vagina , Humanos , Feminino , Reprodutibilidade dos Testes , Estudos Prospectivos , Variações Dependentes do Observador , Ultrassonografia , Vagina/diagnóstico por imagem

6.

Evaluating inter-rater reliability in the context of "Sysmex UN2000 detection of protein/creatinine ratio and of renal tubular epithelial cells can be used for screening lupus nephritis": a statistical examination.

Li, Ming; Gao, Qian; Yang, Jing; Yu, Tianfei.

BMC Nephrol ; 25(1): 94, 2024 Mar 13.

Artigo em Inglês | MEDLINE | ID: mdl-38481181

RESUMO

BACKGROUND: The evaluation of inter-rater reliability (IRR) is integral to research designs involving the assessment of observational ratings by two raters. However, existing literature is often heterogeneous in reporting statistical procedures and the evaluation of IRR, although such information can impact subsequent hypothesis testing analyses. METHODS: This paper evaluates a recent publication by Chen et al., featured in BMC Nephrology, aiming to introduce an alternative statistical approach to assessing IRR and discuss its statistical properties. The study underscores the crucial need for selecting appropriate Kappa statistics, emphasizing the accurate computation, interpretation, and reporting of commonly used IRR statistics between two raters. RESULTS: The Cohen's Kappa statistic is typically used for two raters dealing with two categories or for unordered categorical variables having three or more categories. On the other hand, when assessing the concordance between two raters for ordered categorical variables with three or more categories, the commonly employed measure is the weighted Kappa. CONCLUSION: Chen and colleagues might have underestimated the agreement between AU5800 and UN2000. Although the statistical approach adopted in Chen et al.'s research did not alter their findings, it is important to underscore the importance of researchers being discerning in their choice of statistical techniques to address their specific research inquiries.

Assuntos

Nefrite Lúpica , Humanos , Creatinina , Reprodutibilidade dos Testes , Nefrite Lúpica/diagnóstico , Variações Dependentes do Observador , Células Epiteliais

7.

Inter- and intra-observer reliability and agreement of O2Pulse inflection during cardiopulmonary exercise testing: A comparison of subjective and novel objective methodology.

Nickolay, Thomas; McGregor, Gordon; Powell, Richard; Begg, Brian; Birkett, Stefan; Nichols, Simon; Ennis, Stuart; Banerjee, Prithwish; Shave, Rob; Metcalfe, James; Hoye, Angela; Ingle, Lee.

PLoS One ; 19(3): e0299486, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38452129

RESUMO

Cardiopulmonary exercise testing (CPET) is the 'gold standard' method for evaluating functional capacity, with oxygen pulse (O2Pulse) inflections serving as a potential indicator of myocardial ischaemia. However, the reliability and agreement of identifying these inflections have not been thoroughly investigated. This study aimed to assess the inter- and intra-observer reliability and agreement of a subjective quantification method for identifying O2Pulse inflections during CPET, and to propose a more robust and objective novel algorithm as an alternative methodology. A retrospective analysis was conducted using baseline data from the HIIT or MISS UK trial. The O2Pulse curves were visually inspected by two independent examiners, and compared against an objective algorithm. Fleiss' Kappa was used to determine the reliability of agreement between the three groups of observations. The results showed almost perfect agreement between the algorithm and both examiners, with a Fleiss' Kappa statistic of 0.89. The algorithm also demonstrated excellent inter-rater reliability (ICC) when compared to both examiners (0.92-0.98). However, a significant level (P ≤0.05) of systematic bias was observed in Bland-Altman analysis for comparisons involving the novice examiner. In conclusion, this study provides evidence for the reliability of both subjective and novel objective methods for identifying inflections in O2Pulse during CPET. These findings suggest that further research into the clinical significance of O2Pulse inflections is warranted, and that the adoption of a novel objective means of quantification may be preferable to ensure equality of outcome for patients.

Assuntos

Teste de Esforço , Humanos , Teste de Esforço/métodos , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Estudos Retrospectivos , Ensaios Clínicos como Assunto

8.

Comparison of the Reliability of the House- Brackmann, Facial Nerve Grading System 2.0, and Sunnybrook Facial Grading System for the Evaluation of Patients with Peripheral Facial Paralysis.

Mengi, Erdem; Orhan Kara, Cüneyt; Necdet Ardiç, Fazil; Topuz, Bülent; Metin, Ulas; Alptürk, Ugur; Aydemir, Gökçe; Senol, Hande.

J Int Adv Otol ; 20(1): 14-18, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38454283

RESUMO

BACKGROUND: To compare the reliability of the House-Brackmann (HB), Facial Nerve Grading System 2.0 (FNGS 2.0), and Sunnybrook Facial Grading System (SB) which are widely used in the evaluation of peripheral facial paralysis (PFP) patients. METHODS: Thirty-five video-recorded adult PFP patients were included in the study. The evaluators comprised 6 physicians. Evaluations were conducted twice independently, utilizing video recordings. Simultaneously, the evaluators were asked to keep time during the evaluation. For the analysis of reliability, Fleiss' kappa coefficient was used for the HB, and the intraclass correlation coefficient (ICC) was used for the FNGS 2.0 and SB. RESULTS: The mean evaluation time of 1 patient was found to be 1.06 ± 0.24, 1.47 ± 0.23, and 2.32 ± 0.41 minutes for the HB, FNGS 2.0, and SB, respectively. For interrater reliability, Fleiss' kappa for the HB was 0.495 and 0.403; ICC for the FNGS 2.0 was 0.966 and 0.958; ICC for the SB was 0.960 and 0.967 for the first and second measurements, respectively. For intrarater reliability, Fleiss' kappa for the HB was 0.391, 0.446, 0.564, 0.502, 0.626, and 0.455; ICC for the FNGS 2.0 was 0.87, 0.982, 0.966, 0.929, 0.933, and 0.948; ICC for the SB was 0.935, 0.96, 0.895, 0.941, 0.96, and 0.94 for the 6 raters, respectively. CONCLUSION: In the present study, statistically high intra- and interrater correlations were found for the FNGS 2.0 and SB, while a moderate correlation was found for the HB. Although the HB seems to be more practical, it has been concluded that the FNGS 2.0 and SB are more reliable.

Assuntos

Paralisia Facial , Adulto , Humanos , Paralisia Facial/diagnóstico , Nervo Facial , Reprodutibilidade dos Testes , Variações Dependentes do Observador , Face

9.

Development and Validation of the Bilingual Catalan/Spanish Cross-Cultural Adaptation of the Consensus Auditory-Perceptual Evaluation of Voice.

Calaf, Neus; Garcia-Quintana, David.

J Speech Lang Hear Res ; 67(4): 1072-1089, 2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-38527275

RESUMO

PURPOSE: This study aimed to develop a valid and reliable bilingual version of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) for the auditory-perceptual evaluation of voice in Catalan and Spanish speakers. METHOD: The development of this CAPE-V adaptation included Delphi methodology with 20 voice and speech experts reaching consensus on the optimal adapted terminology of the perceptual vocal attributes, considering also input from the original instrument authors. The adaptation and validation of vocal tasks followed a sequential validation procedure, with input from phoneticians and speech-language pathologists. Following pilot testing with a large sample of speech-language pathology students, a refined adapted version was empirically tested for validity and reliability. Concurrent validity was assessed by comparing the adapted CAPE-V with the reference Grade, Roughness, Breathiness, Asthenia, Strain scale. Construct validity was assessed through convergent and discriminant validity analysis. Intrarater and interrater reliability were assessed via intraclass correlation coefficient calculations. User experience was evaluated through a questionnaire. Scale properties were validated using a confusion matrix, and cutoff values were calculated to achieve the optimal balance between sensitivity and specificity. RESULTS: Through a formalized consensus process, optimal Catalan/Spanish terminology was determined for the perceptual attributes of voice present in the CAPE-V. An adapted protocol of tasks was obtained that preserves the objectives of the original instrument and the relevance of the phonetic criteria in the target languages. The results demonstrated concurrent validity, construct validity, and intrarater reliability. Interrater reliability was found to depend on the extent to which evaluators shared their internal standards. The raters identified CAPE-V as an effective and preferred instrument. CONCLUSION: An adapted, validated version of the CAPE-V is made available to clinical professionals for the evaluation of voice in Catalan and Spanish speakers.

Assuntos

Disfonia , Humanos , Comparação Transcultural , Consenso , Reprodutibilidade dos Testes , Qualidade da Voz , Variações Dependentes do Observador

10.

Intraobserver and interobserver agreement of 8 segmental reflexes in healthy dogs.

Chiang, Bryan; Garcia, Gabriel; Leverone, Francesco; Hernandez, Jorge A; Carrera-Justiz, Sheila.

J Vet Intern Med ; 38(2): 1101-1110, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38339888

RESUMO

BACKGROUND: No available literature supports the claim that the patellar and withdrawal (flexor) reflexes are the only reliable segmental reflexes in dogs. OBJECTIVE: Measure intra- and interobserver agreement of 8 segmental reflexes in dogs without clinical evidence of orthopedic or neurologic disease. ANIMALS: One-hundred and one client- or staff-owned dogs between 1 and 10 years of age with no clinical evidence of orthopedic disease, myelopathy, or neuromuscular disease. METHODS: Descriptive study. The intraobserver proportion of agreement (%) of responses to selected segmental reflexes in right versus left limbs by 3 observers was calculated and reported. The interobserver agreement of 2 observers of responses to selected reflexes was estimated by calculating proportions of agreement, kappa values, and 95% confidence intervals. A segmental reflex with an acceptable agreement was defined as that with a proportion of agreement ≥90% and a Kappa value ≥0.61 in both limbs. RESULTS: The intraobserver proportion of agreement for all 3 observers was high (≥95%) for the extensor carpi radialis, withdrawal, patellar, and cranial tibial reflexes. Between observers 1 and 3 and observers 2 and 3, the interobserver proportion of agreement was high (≥ 92%) for the extensor carpi radialis (κ 0.66, not determined [ND]), withdrawal (both limbs, κ ND), patellar (κ ND), and cranial tibial reflexes (κ ND). CONCLUSIONS AND CLINICAL IMPORTANCE: The extensor carpi radialis, withdrawal, patellar, and cranial tibial reflexes had a higher proportion of agreement and kappa values between 2 observers.

Assuntos

Doenças do Cão , Doenças da Medula Espinal , Humanos , Cães , Animais , Variações Dependentes do Observador , Reflexo , Extremidades , Doenças da Medula Espinal/veterinária , Reprodutibilidade dos Testes

11.

Regard to assessing agreement between two raters with kappa statistics.

Yu, Tianfei; Ren, Bingrui; Li, Ming.

Int J Cardiol ; 403: 131896, 2024 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-38387729

Assuntos

Reprodutibilidade dos Testes , Humanos , Variações Dependentes do Observador

12.

Neck circumference is a highly reliable anthropometric measure in older adults requiring long-term care.

Sato, Ryo; Sawaya, Yohei; Ishizaka, Masahiro; Yin, Lu; Shiba, Takahiro; Hirose, Tamaki; Urano, Tomohiko.

PeerJ ; 12: e16816, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38313007

RESUMO

The reliability of neck circumference measurement as an assessment tool for older adults requiring long-term care remains unknown. This study aimed to evaluate the reliability of neck circumference measurement in older adults requiring long-term care, and the effect of edema on measurement error. Two physical therapists measured the neck circumference. Intraclass correlation coefficient (ICC) and Bland-Altman analyses were performed to examine the reliability of neck circumference measurement. Correlation analysis was used to evaluate the relationship between edema values (extracellular water/total body water) and neck circumference measurement difference. For inter-rater reliability of neck circumference measurement, the overall ICC (2,1) was 0.98. The upper and lower limits of the difference between examiners ranged from -0.9 to 1.2 cm. There was no association between edema values and neck circumference measurement error. Thus, measurement of the neck circumference in older adults requiring long-term care is a reliable assessment tool, with a low error rate, even in older adults with edema.

Assuntos

Assistência de Longa Duração , Pescoço , Humanos , Idoso , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Edema/diagnóstico

13.

Augmented interpretation of HER2, ER, and PR in breast cancer by artificial intelligence analyzer: enhancing interobserver agreement through a reader study of 201 cases.

Jung, Minsun; Song, Seung Geun; Cho, Soo Ick; Shin, Sangwon; Lee, Taebum; Jung, Wonkyung; Lee, Hajin; Park, Jiyoung; Song, Sanghoon; Park, Gahee; Song, Heon; Park, Seonwook; Lee, Jinhee; Kang, Mingu; Park, Jongchan; Pereira, Sergio; Yoo, Donggeun; Chung, Keunhyung; Ali, Siraj M; Kim, So-Woon.

Breast Cancer Res ; 26(1): 31, 2024 02 23.

Artigo em Inglês | MEDLINE | ID: mdl-38395930

RESUMO

BACKGROUND: Accurate classification of breast cancer molecular subtypes is crucial in determining treatment strategies and predicting clinical outcomes. This classification largely depends on the assessment of human epidermal growth factor receptor 2 (HER2), estrogen receptor (ER), and progesterone receptor (PR) status. However, variability in interpretation among pathologists pose challenges to the accuracy of this classification. This study evaluates the role of artificial intelligence (AI) in enhancing the consistency of these evaluations. METHODS: AI-powered HER2 and ER/PR analyzers, consisting of cell and tissue models, were developed using 1,259 HER2, 744 ER, and 466 PR-stained immunohistochemistry (IHC) whole-slide images of breast cancer. External validation cohort comprising HER2, ER, and PR IHCs of 201 breast cancer cases were analyzed with these AI-powered analyzers. Three board-certified pathologists independently assessed these cases without AI annotation. Then, cases with differing interpretations between pathologists and the AI analyzer were revisited with AI assistance, focusing on evaluating the influence of AI assistance on the concordance among pathologists during the revised evaluation compared to the initial assessment. RESULTS: Reevaluation was required in 61 (30.3%), 42 (20.9%), and 80 (39.8%) of HER2, in 15 (7.5%), 17 (8.5%), and 11 (5.5%) of ER, and in 26 (12.9%), 24 (11.9%), and 28 (13.9%) of PR evaluations by the pathologists, respectively. Compared to initial interpretations, the assistance of AI led to a notable increase in the agreement among three pathologists on the status of HER2 (from 49.3 to 74.1%, p < 0.001), ER (from 93.0 to 96.5%, p = 0.096), and PR (from 84.6 to 91.5%, p = 0.006). This improvement was especially evident in cases of HER2 2+ and 1+, where the concordance significantly increased from 46.2 to 68.4% and from 26.5 to 70.7%, respectively. Consequently, a refinement in the classification of breast cancer molecular subtypes (from 58.2 to 78.6%, p < 0.001) was achieved with AI assistance. CONCLUSIONS: This study underscores the significant role of AI analyzers in improving pathologists' concordance in the classification of breast cancer molecular subtypes.

Assuntos

Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/metabolismo , Receptores de Estrogênio/metabolismo , Biomarcadores Tumorais/metabolismo , Inteligência Artificial , Variações Dependentes do Observador , Receptores de Progesterona/metabolismo , Receptor ErbB-2/metabolismo

14.

Large-scale analysis of interobserver agreement and reliability in cardiotocography interpretation during labor using an online tool.

Ben M'Barek, Imane; Ben M'Barek, Badr; Jauvion, Grégoire; Holmström, Emilia; Agman, Antoine; Merrer, Jade; Ceccaldi, Pierre-François.

BMC Pregnancy Childbirth ; 24(1): 136, 2024 Feb 14.

Artigo em Inglês | MEDLINE | ID: mdl-38355457

RESUMO

BACKGROUND: While the effectiveness of cardiotocography in reducing neonatal morbidity is still debated, it remains the primary method for assessing fetal well-being during labor. Evaluating how accurately professionals interpret cardiotocography signals is essential for its effective use. The objective was to evaluate the accuracy of fetal hypoxia prediction by practitioners through the interpretation of cardiotocography signals and clinical variables during labor. MATERIAL AND METHODS: We conducted a cross-sectional online survey, involving 120 obstetric healthcare providers from several countries. One hundred cases, including fifty cases of fetal hypoxia, were randomly assigned to participants who were invited to predict the fetal outcome (binary criterion of pH with a threshold of 7.15) based on the cardiotocography signals and clinical variables. After describing the participants, we calculated (with a 95% confidence interval) the success rate, sensitivity and specificity to predict the fetal outcome for the whole population and according to pH ranges, professional groups and number of years of experience. Interobserver agreement and reliability were evaluated using the proportion of agreement and Cohen's kappa respectively. RESULTS: The overall ability to predict a pH level below 7.15 yielded a success rate of 0.58 (95% CI 0.56-0.60), a sensitivity of 0.58 (95% CI 0.56-0.60) and a specificity of 0.63 (95% CI 0.61-0.65). No significant difference in the success rates was observed with respect to profession and number of years of experience. The success rate was higher for the cases with a pH level below 7.05 (0.69) and above 7.20 (0.66) compared to those falling between 7.05 and 7.20 (0.48). The proportion of agreement between participants was good (0.82), with an overall kappa coefficient indicating substantial reliability (0.63). CONCLUSIONS: The use of an online tool enabled us to collect a large amount of data to analyze how practitioners interpret cardiotocography data during labor. Despite a good level of agreement and reliability among practitioners, the overall accuracy is poor, particularly for cases with a neonatal pH between 7.05 and 7.20. Factors such as profession and experience level do not present notable impact on the accuracy of the annotations. The implementation and use of a computerized cardiotocography analysis software has the potential to enhance the accuracy to detect fetal hypoxia, especially for ambiguous cardiotocography tracings.

Assuntos

Cardiotocografia , Hipóxia Fetal , Gravidez , Recém-Nascido , Feminino , Humanos , Cardiotocografia/métodos , Hipóxia Fetal/diagnóstico , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Estudos Transversais , Frequência Cardíaca Fetal

15.

CD, or not CD, that is the question: a digital interobserver agreement study in coeliac disease.

Denholm, James; Schreiber, Benjamin A; Jaeckle, Florian; Wicks, Mike N; Benbow, Emyr W; Bracey, Tim S; Chan, James Y H; Farkas, Lorant; Fryer, Eve; Gopalakrishnan, Kishore; Hughes, Caroline A; Kirkwood, Kathryn J; Langman, Gerald; Mahler-Araujo, Betania; McMahon, Raymond F T; Myint, Khun La Win; Natu, Sonali; Robinson, Andrew; Sanduka, Ashraf; Sheppard, Katharine A; Tsang, Yee Wah; Arends, Mark J; Soilleux, Elizabeth J.

BMJ Open Gastroenterol ; 11(1)2024 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-38302475

RESUMO

OBJECTIVE: Coeliac disease (CD) diagnosis generally depends on histological examination of duodenal biopsies. We present the first study analysing the concordance in examination of duodenal biopsies using digitised whole-slide images (WSIs). We further investigate whether the inclusion of immunoglobulin A tissue transglutaminase (IgA tTG) and haemoglobin (Hb) data improves the interobserver agreement of diagnosis. DESIGN: We undertook a large study of the concordance in histological examination of duodenal biopsies using digitised WSIs in an entirely virtual reporting setting. Our study was organised in two phases: in phase 1, 13 pathologists independently classified 100 duodenal biopsies (40 normal; 40 CD; 20 indeterminate enteropathy) in the absence of any clinical or laboratory data. In phase 2, the same pathologists examined the (re-anonymised) WSIs with the inclusion of IgA tTG and Hb data. RESULTS: We found the mean probability of two observers agreeing in the absence of additional data to be 0.73 (±0.08) with a corresponding Cohen's kappa of 0.59 (±0.11). We further showed that the inclusion of additional data increased the concordance to 0.80 (±0.06) with a Cohen's kappa coefficient of 0.67 (±0.09). CONCLUSION: We showed that the addition of serological data significantly improves the quality of CD diagnosis. However, the limited interobserver agreement in CD diagnosis using digitised WSIs, even after the inclusion of IgA tTG and Hb data, indicates the importance of interpreting duodenal biopsy in the appropriate clinical context. It further highlights the unmet need for an objective means of reproducible duodenal biopsy diagnosis, such as the automated analysis of WSIs using artificial intelligence.

Assuntos

Doença Celíaca , Humanos , Doença Celíaca/diagnóstico , Transglutaminases , Inteligência Artificial , Variações Dependentes do Observador , Imunoglobulina A

16.

Reliability of landmark identification for analysis of the temporomandibular joint in real-time MRI.

Mouchoux, Jérémy; Meyer-Marcotty, Philipp; Sojka, Florian; Dechent, Peter; Klenke, Daniela; Wiechens, Bernhard; Quast, Anja.

Head Face Med ; 20(1): 10, 2024 Feb 17.

Artigo em Inglês | MEDLINE | ID: mdl-38365709

RESUMO

BACKGROUND: Real-time magnetic resonance imaging (rtMRI) is essential for diagnosing and comprehending temporomandibular joint (TMJ) movements. Current methods for tracking and analysis require manual landmark placement on each acquisition frame. Therefore, our study aimed to assess the inter- and intra-rater reliability of placing cephalometric landmarks in frames from a dynamic real-time TMJ MRI. MATERIAL AND METHODS: Four real-time MRIs of the right TMJ were taken during mandibular movement at ten frames per second. Seven dentists identified ten landmarks on two frames (intercuspal position-ICP-and maximum mouth opening-MMO) twice at a two-week interval, yielding 112 tracings. Six typical cephalometric measurements (angles and distances) were derived from these landmarks. The reliabilities of landmarks and measurements were evaluated using distance-based (dbICC), linear mixed effect model intraclass correlation (lmeICC), and standard ICC. RESULTS: The average inter-rater reliability for the landmarks stood at 0.92 (dbICC) and 0.93 (lmeICC). The intra-rater reliability scores were 0.97 and 0.98. Over 80% of the landmarks showed an ICC greater than 0.98 (inter-rater) and over 0.99 (intra-rater). The lowest landmark ICC was observed for the orbitale and the oblique ridge of the mandibular ramus. However, the cephalometric angle and distance measurements derived from these landmarks showed only moderate to good reliability, whereas the reliability in the frames with ICP was better than those with MMO. Measurements performed in the ICP frame were more reliable than measurements in the MMO frame. CONCLUSION: While dentists reliably localize isolated landmarks in real-time MRIs, the cephalometric measurements derived from them remain inconsistent. The better results in ICP than MMO are probably due to a more familiar jaw position. The higher error rate of the TMJ measurements in MMO could be associated with a lack of training in real-time MRI analysis in dentistry.

Assuntos

Imageamento por Ressonância Magnética , Articulação Temporomandibular , Humanos , Reprodutibilidade dos Testes , Articulação Temporomandibular/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos , Mandíbula , Cefalometria/métodos , Variações Dependentes do Observador

17.

Application of computer-aided detection for NCCN-based follow-up recommendation in subsolid nodules: Effect on inter-observer agreement.

Quanyang, Wu; Lina, Zhou; Yao, Huang; Jiawei, Wang; Wei, Tang; Linlin, Qi; Zewei, Zhang; Donghui, Hou; Hongjia, Li; Shuluan, Chen; Jiaxing, Zhang; Shijun, Zhao.

Cancer Med ; 13(2): e6967, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38348960

RESUMO

RATIONALE AND OBJECTIVES: Computer-aided detection (CAD) of pulmonary nodules reduces the impact of observer variability, improving the reliability and reproducibility of nodule assessments in clinical practice. Therefore, this study aimed to assess the impact of CAD on inter-observer agreement in the follow-up management of subsolid nodules. MATERIALS AND METHODS: A dataset comprising 60 subsolid nodule cases was constructed based on the National Cancer Center lung cancer screening data. Five observers independently assessed all low-dose computed tomography scans and assigned follow-up management strategies to each case according to the National Comprehensive Cancer Network (NCCN) guidelines, using both manual measurements and CAD assistance. The linearly weighted Cohen's kappa test was used to measure agreement between paired observers. Agreement among multiple observers was evaluated using the Fleiss kappa statistic. RESULTS: The agreement of the five observers for NCCN follow-up management categorization was moderate when measured manually, with a Fleiss kappa score of 0.437. Utilizing CAD led to a notable enhancement in agreement, achieving a substantial consensus with a Fleiss kappa value of 0.623. After using CAD, the proportion of major and substantial management discrepancies decreased from 27.5% to 15.8% and 4.8% to 1.5%, respectively (p < 0.01). In 23 lung cancer cases presenting as part-solid nodules, CAD significantly elevates the average sensitivity in detecting lung cancer cases presenting as part-solid nodules (overall sensitivity, 82.6% vs. 92.2%; p < 0.05). CONCLUSION: The application of CAD significantly improves inter-observer agreement in the follow-up management strategy for subsolid nodules. It also demonstrates the potential to reduce substantial management discrepancies and increase detection sensitivity in lung cancer cases presenting as part-solid nodules.

Assuntos

Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Reprodutibilidade dos Testes , Detecção Precoce de Câncer , Variações Dependentes do Observador , Seguimentos , Computadores

18.

Torsobarography: Intra-Observer Reliability Study of a Novel Posture Analysis Based on Pressure Distribution.

Stecher, Nico; Heinke, Andreas; Zurawski, Arkadiusz Lukasz; Harder, Maximilian Robert; Schumann, Paula; Jochim, Thurid; Malberg, Hagen.

Sensors (Basel) ; 24(3)2024 Jan 24.

Artigo em Inglês | MEDLINE | ID: mdl-38339484

RESUMO

Postural deformities often manifest themselves in a sagittal imbalance and an asymmetric morphology of the torso. As a novel topographic method, torsobarography assesses the morphology of the back by analysing pressure distribution along the torso in a lying position. At torsobarography's core is a capacitive pressure sensor array. To evaluate its feasibility as a diagnostic tool, the reproducibility of the system and extracted anatomical associated parameters were evaluated on 40 subjects. Landmarks and reference distances were identified within the pressure images. The examined parameters describe the shape of the spine, various structures of the trunk symmetry, such as the scapulae, and the pelvic posture. The results showed that the localisation of the different structures performs with a good (ICC > 0.75) to excellent (ICC > 0.90) reliability. In particular, parameters for approximating the sagittal spine shape were reliably reproduced (ICC > 0.83). Lower reliability was observed for asymmetry parameters, which can be related to the low variability within the subject group. Nonetheless, the reliability levels of selected parameters are comparable to commercial systems. This study demonstrates the substantial potential of torsobarography at its current stage for reliable posture analysis and may pave the way as an early detection system for postural deformities.

Assuntos

Postura , Coluna Vertebral , Humanos , Reprodutibilidade dos Testes , Variações Dependentes do Observador , Pelve

19.

Intra and inter-rater reproducibility of the Remote Static Posture Assessment (ARPE) protocol's Postural Checklist.

Pilling, Betiane Moreira; Candotti, Cláudia Tarragô; Silva, Marcelle Guimarães; Frantz, Marina Ziegler; Noll, Matias.

PLoS One ; 19(2): e0297506, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38335201

RESUMO

With the enforcement of social distancing due to the pandemic, a need to conduct postural assessments through remote care arose. So, this study aimed to assess the intra- and inter-rater reproducibility of the Remote Static Posture Assessment (ARPE) protocol's Postural Checklist. The study involved 51 participants, with the postural assessment conducted by two researchers. For intra-rater reproducibility assessment, one rater administered the ARPE protocol twice, with an interval of 7-days between assessments (test-retest). A second independent rater assessed inter-rater reproducibility. Kappa statistics (k) and percentage agreement (%C) were used, with a significance level of 0.05. The intra-rater reproducibility analysis indicated high reliability, k values varied from 0.921 to 1.0, with %C ranging from 94% to 100% for all items on the ARPE protocol's Postural Checklist. Inter-rater reproducibility indicates reliability ranging from slight to good, k values exceeded 0.4 for the entire checklist, except for four items: waists in the frontal photograph (k = 0.353), scapulae in the rear photograph (k = 0.310), popliteal line of the knees in the rear photograph (k = 0.270), and foot posture in the rear photograph (k = 0.271). Nonetheless, %C surpassed 50% for all but the scapulae item (%C = 47%). The ARPE protocol's Postural Checklist is reproducible and can be administered by the same or different raters for static posture assessment. However, when used by distinct raters, the items waists (front of the frontal plane), scapulae, popliteal line of the knees, and feet (rear of the frontal plane) should not be considered.

Assuntos

Lista de Checagem , Postura , Humanos , Reprodutibilidade dos Testes , Variações Dependentes do Observador

20.

TB or not TB? Diagnostic Sensitivity, Specifity and Interobserver Agreement in the Radiological Diagnosis of Pulmonary Tuberculosis in Children.

Brinkmann, Folke; Hofgrefe, Jana; Ahrens, Frank; Weidemann, Jürgen; Berthold, Lars Daniel; Schwerk, Nicolaus.

Klin Padiatr ; 236(2): 123-128, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38320580

RESUMO

BACKGROUND: The differentiation between latent tuberculosis infection (LTBI) and tuberculosis (TB) relies on radiological changes. Confirming the diagnosis remains a challenge because typical findings are often missing in children. This study evaluates diagnostic sensitivity, specifity and interobserver agreement on the radiological diagnosis of TB by chest-x-rays in accordance to professional specialization and work experience. METHODS: Chest x-rays of 120 children with proven tuberculosis infection were independently evaluated by general radiologists, paediatric radiologists and paediatric pulmonologists. Results were compared to a reference diagnosis created by group of experienced paediatric radiologists and paediatric pulmonologists. Primary endpoints were diagnostic sensitivity and specificity and interobserver variability defined as Krippendorfs alpha of thesel groups compared to the reference diagnosis. RESULTS: Of the 120 chest x-rays 33 (27,5%) were diagnosed as TB by the reference standard . Paediatric pulmonologist had the highest diagnostic sensitivity (90%) but were less specific (71%) whereas general radiologist were less sensitive (68%) but more secific (95%). The best diagnostic accuracy was achieved by pediatric radiologists with a diagnostic sensitivity of 77% and specificity 95% respectively. CONCLUSIONS: We demonstrated significant interobserver variability and relevant differences in sensitivity and specificity in the radiological diagnosis of TB between the groups. Paediatric radiologists showed the best diagnostic performance. As the diagnosis of pulmonary TB has significant therapeutic consequences for children they should be routinely involved in the diagnostic process.

Assuntos

Tuberculose Pulmonar , Tuberculose , Humanos , Criança , Variações Dependentes do Observador , Tuberculose/diagnóstico , Tuberculose Pulmonar/diagnóstico por imagem , Sensibilidade e Especificidade

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA